Enriching Digitized Medieval Manuscripts: Linking Image, Text and Lexical Knowledge
نویسنده
چکیده
This paper describes an on-going project of transcribing and annotating digitized manuscripts of medieval Spanish with paleographic and lexical information. We link lexical units from the manuscripts with the Multilingual Central Repository (MCR), making terms retrievable by any of the languages that integrate MCR. The goal of the project is twofold: creating a paleographic knowledge base from digitized medieval facsimiles, that will allow paleographers, philologist, historical linguist, and humanities scholars in general, to analyze and retrieve graphemic, lexical and textual information from historical documents; and on the other hand, developing machine readable documents that will link image representations of graphemic and lexical units in a facsimile with Linked Open Data resources. This paper concentrates on the encoding and cross-linking procedures.
منابع مشابه
Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography
Medieval manuscripts or other written documents from that period contain valuable information about people, religion, and politics of the medieval period, making the study of medieval documents a necessary pre-requisite to gaining in-depth knowledge of medieval history. Although tool-less study of such documents is possible and has been ongoing for centuries, much subtle information remains loc...
متن کاملMedieval Manuscripts, Hypertext and Reading. Visions of Digital Editions
How was a medieval manuscript meant to be read? This is a question that has concerned me for a long time in my work with Old Swedish manuscripts from Vadstena Abbey. In many manuscripts we can find traces of the historical reading situation; for example, pointing hands, marginal notes, etc. Such signals had an important function for the medieval reader, but they are rarely put forward in modern...
متن کاملFeature Extraction in Segmented Words for Semi-automatic Transcription of Handwritten Arabic Documents
Scanning is a widely used solution for the preservation of ancient manuscripts. However, this solution gives masses of document images which content is not easily exploitable. In this work, we propose a new method that reduces considerably the manual transcription. The aim is to explore the content of digitized manuscripts. The proposed method is based on two main phases: the first one consists...
متن کاملSpecifying a TEI-XML Based Format for Aligning Text to Image at Character Level
This papers presents an experience of specifying and implementing an XML format for text to image alignment at word and character level within the TEI framework. The format in question is a supplementary markup layer applied to heterogeneous transcriptions of medieval Latin and French manuscripts encoded using different “flavors” of the TEI (normalized for critical editions, diplomatic or palae...
متن کاملAutomatic Algorithms for Medieval Manuscript Analysis
Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field, but requires automatic tools for analyzing and indexing these datasets. We will describe a set of completely automatic solu...
متن کامل